A New Visual Speech Recognition Approach for RGB-D Cameras
نویسندگان
چکیده
Visual speech recognition remains a challenging topic due to various speaking characteristics. This paper proposes a new approach for lipreading to recognize isolated speech segments (words, digits, phrases, etc.) using both of 2D image and depth data. The process of the proposed system is divided into three consecutive steps, namely, mouth region tracking and extraction, motion and appearance descriptors (HOG and MBH) computing, and classification using the Support Vector Machine (SVM) method. To evaluate the proposed approach, three public databases (MIRACL-VC, Ouluvs, and CUAVE) were used. Speaker dependent and speaker independent settings were considered in the evaluation experiments. The obtained recognition results demonstrate that lipreading can be performed effectively, and the proposed approach outperforms recent works in the literature for the speaker dependent setting while being competitive for the speaker independent setting.
منابع مشابه
Machine for RGB - D Action Recognition
Bilinear Heterogeneous Information Machine for RGB-D Action Recognition Report Title This paper proposes a novel approach to action recognition from RGB-D cameras, in which depth features and RGB visual features are jointly used. Rich heterogeneous RGB and depth data are effectively compressed and projected to a learned shared space, in order to reduce noise and capture useful information for r...
متن کاملIntroduction to the special issue on visual understanding and applications with RGB-D cameras
The prevalence of affordable RGB-D cameras, such as Microsoft's Kinect and ASUS’s Xtion Pro Live Sensors, is driving a revolution of the landscape of computer vision and vision related research. The pixel-level depth and visual (RGB) information provided by a RGB-D camera not only enables robust vision applications but also opens up new research problems and opportunities across a wide range of...
متن کاملRGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments
RGB-D cameras (such as the Microsoft Kinect) are novel sensing systems that capture RGB images along with per-pixel depth information. In this paper we investigate how such cameras can be used for building dense 3D maps of indoor environments. Such maps have applications in robot navigation, manipulation, semantic mapping, and telepresence. We present RGB-D Mapping, a full 3D mapping system tha...
متن کاملRobust Intrinsic and Extrinsic Calibration of RGB-D Cameras
Color-depth cameras (RGB-D cameras) have become the primary sensors in most robotics systems, from service robotics to industrial robotics applications. Typical consumergrade RGB-D cameras are provided with a coarse intrinsic and extrinsic calibration that generally does not meet the accuracy requirements needed by many robotics applications (e.g., high accuracy 3D environment reconstruction an...
متن کاملMultimodal Signal Processing and Learning Aspects of Human-Robot Interaction for an Assistive Bathing Robot
We explore new aspects of assistive living on smart human-robot interaction (HRI) that involve automatic recognition and online validation of speech and gestures in a natural interface, providing social features for HRI. We introduce a whole framework and resources of a real-life scenario for elderly subjects supported by an assistive bathing robot, addressing health and hygiene care issues. We...
متن کامل